1. Introduction

In this exploratory data analysis, our group examines dengue fever cases in the city of Mérida, Mexico, using data provided by Professor Clennon of the Emory Environmental Science Department. Dengue fever is a mosquito-borne viral infection caused by the dengue virus, which poses a significant public health concern in tropical and subtropical regions.

Some of dengue fever’s most common symptoms include sudden drops in blood pressure, nausea, and muscle pain. Researchers have found that Merida, the capital of the state of Yucatán, experiences recurrent dengue outbreaks due to its tropical climate, seasonal rainfall and standing water, and dense urban population density.

Our group is interested in public health, epidemiology, and the positive effect that well-developed data analysis can have on urban welfare. In part, this interest is a result of the COVID-19 pandemic and widespread reporting on its pathology, treatment, and mitigation. Dengue’s effects are similar to COVID-19’s, and the disease similarly impacts communities worldwide. However, its relatively slower spread and its prevalence predominantly in tropical areas with underdeveloped healthcare infrastructure have led to severe neglect compared to other dangerous diseases. This study is primarily an exercise in geospatial and temporal analysis of viral disease.

Our dengue dataset includes 540 rows, each representing a distinct region within Mérida. Each region is quantified based on its coordinates from the Universal Transverse Mercator (UTM) system, a grid-based coordinate system used to map locations using meters. The dataset initially had 8 columns: the X and Y UTM coordinates of each region, SP_ID (an identifier for each region), CVEGEO (geographic identifier code used by the Mexican Geographic Agency), and the number of cases in each region across four different years (2012-2015). As we explain in the Data Cleaning step, SP_ID and CVEGEO are not necessary for our analysis here. Additionally, we have included complementary data on ovitrap egg density (the quantity of mosquito eggs recorded for 4177 ovitraps distributed throughout the city). Although it is not the main focus of our analysis, it provides some insight into potential examination of the relationship between mosquito breeding sites and dengue incidence.

2. Project Setup, Data Loading, and Data Cleaning

Before developing any visualizations or analytical models, we prepared the R environment by loading our primary dataset, “Merida_Den12_13_14_15.csv,” along with a complete suite of R packages. We used the tidyverse package, which provides core tools like dplyr for data transformation and ggplot2 for static visualization. For our more advanced interactive plots, we loaded plotly (used for the animated density map) and leaflet (used for the interactive Year-over-Year map). We also included sf to read and handle the simple features (geospatial) data for Merida. Once all libraries were loaded, we imported the dataset into a data frame. The resulting data frame contained 540 rows and 8 columns, representing spatial identifiers, coordinate values, and annual dengue case counts. This formed the foundation for our exploratory analysis.

Data Cleaning is a crucial step to ensure the integrity and relevance of the data prior to our analysis. We want to ensure that all the data required for this project is clean, validated, and standardized so that our group can create data visualizations using the shared data.

The raw dataset contains several columns, including SP_ID and CVEGEO, which serve as identifiers for position and case record. After discussing the goals for the project, we determined that our exploratory analysis is focused exclusively on the relationship between spatial coordinates and case counts. Therefore, the important features for this analysis are the X and Y coordinates and the annual case count columns (Den2012 through Den2015). The SP_ID and CVEGEOCVEGEO columns, while potentially useful for future work like joining with demographic or climate data, are not required for our current visualization and modeling goals. Therefore, we dropped the project’s irrelevant columns, creating a standard data frame named data_cleaned.

Next, we checked the dataframe for missing data because null values can introduce significant bias, propagate errors in calculations, and cause visualization failures. The R console output confirms that all columns returned 0 null values. This finding validates the completeness of our dataset, allowing us to proceed with our analysis without further cleaning procedures, ensuring our results are based on the full set of observations.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout
library(leaflet) 
library(sf) 
## Linking to GEOS 3.13.0, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE
library(htmltools)
library(scales)
## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor
data <- read.csv('Merida_Den12_13_14_15.csv')

head(data)
##          X       Y SP_ID        CVEGEO Den2012 Den2013 Den2014 Den2015
## 1 231639.6 2317510     0 3104100010070      17       6       2       2
## 2 231323.1 2316622     1 3104100010085      35      15      10      18
## 3 230769.7 2316113     2 310410001009A      35      14       8      17
## 4 233250.8 2316208     3 3104100010136      40       8       6      10
## 5 233266.3 2317442     4 3104100010140      12       8       4       4
## 6 232736.6 2317238     5 3104100010155      30       8       1       5
data_cleaned <- data %>%
  select(X, Y, starts_with("Den"))
na_counts <- colSums(is.na(data_cleaned))
print(na_counts)
##       X       Y Den2012 Den2013 Den2014 Den2015 
##       0       0       0       0       0       0
head(data_cleaned)
##          X       Y Den2012 Den2013 Den2014 Den2015
## 1 231639.6 2317510      17       6       2       2
## 2 231323.1 2316622      35      15      10      18
## 3 230769.7 2316113      35      14       8      17
## 4 233250.8 2316208      40       8       6      10
## 5 233266.3 2317442      12       8       4       4
## 6 232736.6 2317238      30       8       1       5

4. Exploratory Analysis 2: Ovitrap Data

Before we further analyze dengue incidence over time, let’s briefly look at some data for mosquito ovitraps. Ovitraps allow researchers to estimate the number of mosquitoes in an area by replicating breeding environments and measuring egg count. As dengue is a mosquito-borne illness, we should keep in mind the relative density of mosquito eggs as we look at disease hot spots later on. For the purposes of this analysis, we will not run statistical tests examining the exact assocation between estimated mosquito density and dengue incidence, as we intend to focus more on geospatial and temporal analytics.

map <- st_read("Merida_S25/Merida_Den12_13_14_15/Merida_Den12_13_14_15.shp")
## Reading layer `Merida_Den12_13_14_15' from data source 
##   `/Users/amanshaik/Documents/GitHub/dengueAnalysisTechWriting/Merida_S25/Merida_Den12_13_14_15/Merida_Den12_13_14_15.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 540 features and 6 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 212287.2 ymin: 2309495 xmax: 236386 ymax: 2333181
## Projected CRS: Mexico ITRF2008 / UTM zone 16N
ovitraps <- st_read("Merida_S25/Ovitraps_Sum/Ovitraps_Sum.shp")
## Reading layer `Ovitraps_Sum' from data source 
##   `/Users/amanshaik/Documents/GitHub/dengueAnalysisTechWriting/Merida_S25/Ovitraps_Sum/Ovitraps_Sum.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 4177 features and 50 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 219360.9 ymin: 2312502 xmax: 233659.3 ymax: 2329610
## Projected CRS: Mexico ITRF2008 / UTM zone 16N
# Create the plot with a gradient from yellow to red, representing relatively low to relatively high egg density. Overlay the plot on a blank regional map of Merida.
ggplot() +
  geom_sf(data = map, color = "black", linewidth = 0.3) +                         # Create map outline
  geom_sf(data = ovitraps, aes(color = Total), size = 2, alpha = 0.9) +           # Generate ovitrap points
  scale_color_gradient(low = "yellow", high = "red", name = "Eggs per Trap") +    # Add color gradient to represent eggs per trap
  labs(title = "Ovitrap Egg Counts in Merida")                                    # Add title

6. Exploratory Analysis 4: Interactive Case Count Map

# 1. Convert data to sf and transform to WGS84 for Leaflet
# We use EPSG:6371 as identified in the original Section 7
data_sf <- st_as_sf(data, coords = c("X", "Y"), crs = 6371)
data_sf_wgs84 <- st_transform(data_sf, crs = 4326)

# 2. Load and transform the neighborhood shapefile
town_map_raw <- st_read("Merida_S25/Merida_Den12_13_14_15/Merida_Den12_13_14_15.shp")
## Reading layer `Merida_Den12_13_14_15' from data source 
##   `/Users/amanshaik/Documents/GitHub/dengueAnalysisTechWriting/Merida_S25/Merida_Den12_13_14_15/Merida_Den12_13_14_15.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 540 features and 6 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 212287.2 ymin: 2309495 xmax: 236386 ymax: 2333181
## Projected CRS: Mexico ITRF2008 / UTM zone 16N
town_map_wgs84 <- st_transform(town_map_raw, crs = 4326)

# 3. Create a sequential color palette for absolute cases
# We find the global max to keep the scale consistent across years
all_cases <- c(data_sf_wgs84$Den2012, data_sf_wgs84$Den2013, data_sf_wgs84$Den2014, data_sf_wgs84$Den2015)
global_max_cases <- max(all_cases, na.rm = TRUE)
pal_cases <- colorNumeric(
  palette = "Reds",
  domain = c(0, global_max_cases),
  na.color = "#a0a0a0"
)

# 4. Create popups for each year
popup_2012 <- ~lapply(sprintf("<strong>ID: %s</strong><br/>2012 Cases: %d", SP_ID, Den2012), HTML)
popup_2013 <- ~lapply(sprintf("<strong>ID: %s</strong><br/>2013 Cases: %d", SP_ID, Den2013), HTML)
popup_2014 <- ~lapply(sprintf("<strong>ID: %s</strong><br/>2014 Cases: %d", SP_ID, Den2014), HTML)
popup_2015 <- ~lapply(sprintf("<strong>ID: %s</strong><br/>2015 Cases: %d", SP_ID, Den2015), HTML)

# 5. Create the interactive leaflet map
m_cases <- leaflet(data_sf_wgs84) %>%
  # Add base map layers
  addProviderTiles(providers$CartoDB.Positron, group = "Light Map") %>%
  addProviderTiles(providers$Esri.WorldImagery, group = "Satellite") %>%
  
  # Add Neighborhoods layer
  addPolygons(
    data = town_map_wgs84,
    group = "Neighborhoods",
    fillOpacity = 0.1,
    color = "#444",
    weight = 1,
    opacity = 0.7,
    popup = ~SP_ID
  ) %>%
  
  # Set the map to focus on the data
  fitBounds(
    ~as.numeric(st_bbox(geometry)[1]),
    ~as.numeric(st_bbox(geometry)[2]),
    ~as.numeric(st_bbox(geometry)[3]),
    ~as.numeric(st_bbox(geometry)[4])
  ) %>%

  # ==== Layer 1: 2012 Cases ====
  addCircleMarkers(
    group = "2012 Cases",
    color = ~pal_cases(Den2012),
    fillOpacity = 0.8,
    stroke = FALSE,
    radius = 5,
    popup = popup_2012,
    label = ~as.character(Den2012)
  ) %>%
  
  # ==== Layer 2: 2013 Cases ====
  addCircleMarkers(
    group = "2013 Cases",
    color = ~pal_cases(Den2013),
    fillOpacity = 0.8,
    stroke = FALSE,
    radius = 5,
    popup = popup_2013,
    label = ~as.character(Den2013)
  ) %>%
  
  # ==== Layer 3: 2014 Cases ====
  addCircleMarkers(
    group = "2014 Cases",
    color = ~pal_cases(Den2014),
    fillOpacity = 0.8,
    stroke = FALSE,
    radius = 5,
    popup = popup_2014,
    label = ~as.character(Den2014)
  ) %>%
  
  # ==== Layer 4: 2015 Cases ====
  addCircleMarkers(
    group = "2015 Cases",
    color = ~pal_cases(Den2015),
    fillOpacity = 0.8,
    stroke = FALSE,
    radius = 5,
    popup = popup_2015,
    label = ~as.character(Den2015)
  ) %>%
  
  # ==== Legend ====
  addLegend(
    pal = pal_cases,
    values = c(0, global_max_cases),
    title = "Case Count",
    position = "bottomright",
    opacity = 1
  ) %>%

  # ==== Layer Control ====
  addLayersControl(
    baseGroups = c("2012 Cases", "2013 Cases", "2014 Cases", "2015 Cases"),
    overlayGroups = c("Light Map", "Satellite", "Neighborhoods"),
    options = layersControlOptions(collapsed = FALSE)
  )

# Display the map
m_cases

While the animated density plot in the previous section provided an excellent overview of the temporal flow, this interactive leaflet map offers another perspective that adds more context for our analysis. By overlaying the case data on real world maps of both the light map as well as the satelite imagery map and adding the ‘Neighborhoods’ shapefile, we can move from abstract coordinates to tangible locations. The layer controls which function as a set of radio buttons to select the basemap to have the data overlayed upon allow for a contextually rich understanding of absolute case counts for each year.

Using this map confirms and deepens our previous findings. The 2012 Cases layer shows the initial outbreak’s hot spots clearly, colored in dark red. Looking at the “Neighborhoods” overlay reveals these are concentrated in specific districts. When switching to the 2013 Cases and 2014 Cases layers, we can observe the city wide decline. The dark red dots (high case counts) almost vanish from the map, reinforcing the narrative of a successful containment period that was first identified in the aggregate bar chart.

The 2015 Cases layer highlights the dramatic resurgence. We can now pinpoint the exact neighborhoods that experienced this new outbreak, which appear to be both a re emergence in some of the 2012 hot spots and an expansion into new adjacent areas. This map is invaluable for understanding the where of the outbreak. However, it only shows absolute counts. To understand the speed and volatility of the spread, we next need to analyze the relative change from one year to the next.

7. Exploratory Analysis 5: Analyzing Year-over-Year (YoY) Change

# 1. Calculate Year-over-Year (YoY) percentage change
# We must handle cases where the denominator (previous year) is 0.
calculate_yoy <- function(current, previous) {
  case_when(
    previous == 0 & current == 0 ~ 0,  # 0 to 0 is 0% change
    previous == 0 & current > 0 ~ Inf, # 0 to N is infinite % change
    previous > 0 ~ (current - previous) / previous
  )
}

data_yoy <- data %>%
  mutate(
    YoY_2013 = calculate_yoy(Den2013, Den2012),
    YoY_2014 = calculate_yoy(Den2014, Den2013),
    YoY_2015 = calculate_yoy(Den2015, Den2014)
  )

# 2. Clean up Inf/-Inf/NaN values for visualization. We'll set Inf to NA.
data_yoy <- data_yoy %>%
  mutate(across(starts_with("YoY_"), ~if_else(is.finite(.), ., NA_real_)))

# 3. Convert to a geospatial 'sf' object and transform coordinates
# The .prj file (EPSG:6371) tells us this is Mexico ITRF2008 / UTM zone 16N.
# Leaflet requires standard WGS84 (EPSG:4326) coordinates (lat/lon).
data_sf <- st_as_sf(data_yoy, coords = c("X", "Y"), crs = 6371)
data_sf_wgs84 <- st_transform(data_sf, crs = 4326)

# 4. Define a diverging color palette
# We cap the domain from -1 (-100%) to 2 (200%) to handle outliers
cap_range <- c(-1, 2)
pal_yoy <- colorNumeric(
  palette = "RdYlBu", 
  domain = cap_range, 
  na.color = "#a0a0a0",
  reverse = TRUE # Make red positive (increase), blue negative (decrease)
)

# 5. Create formatted popup and label content
# Popups appear on click
popup_2013 <- ~lapply(sprintf(
  "<strong>Location ID: %s</strong><br/>2012 Cases: %d<br/>2013 Cases: %d<br/>YoY Change: %s",
  SP_ID, Den2012, Den2013, scales::percent(YoY_2013, accuracy = 0.1)
), HTML)

popup_2014 <- ~lapply(sprintf(
  "<strong>Location ID: %s</strong><br/>2013 Cases: %d<br/>2014 Cases: %d<br/>YoY Change: %s",
  SP_ID, Den2013, Den2014, scales::percent(YoY_2014, accuracy = 0.1)
), HTML)

popup_2015 <- ~lapply(sprintf(
  "<strong>Location ID: %s</strong><br/>2014 Cases: %d<br/>2015 Cases: %d<br/>YoY Change: %s",
  SP_ID, Den2014, Den2015, scales::percent(YoY_2015, accuracy = 0.1)
), HTML)

# Labels appear on hover
label_2013 <- ~scales::percent(YoY_2013, accuracy = 0.1)
label_2014 <- ~scales::percent(YoY_2014, accuracy = 0.1)
label_2015 <- ~scales::percent(YoY_2015, accuracy = 0.1)
# Load and transform the neighborhood shapefile
town_map_raw <- st_read("Merida_S25/Merida_Den12_13_14_15/Merida_Den12_13_14_15.shp")
## Reading layer `Merida_Den12_13_14_15' from data source 
##   `/Users/amanshaik/Documents/GitHub/dengueAnalysisTechWriting/Merida_S25/Merida_Den12_13_14_15/Merida_Den12_13_14_15.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 540 features and 6 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 212287.2 ymin: 2309495 xmax: 236386 ymax: 2333181
## Projected CRS: Mexico ITRF2008 / UTM zone 16N
town_map_wgs84 <- st_transform(town_map_raw, crs = 4326)

# Create the interactive leaflet map
m <- leaflet(data_sf_wgs84) %>%
  # Add base map layers
  addProviderTiles(providers$CartoDB.Positron, group = "Light Map") %>%
  addProviderTiles(providers$Esri.WorldImagery, group = "Satellite") %>%
  
  # Add Neighborhoods layer
  addPolygons(
    data = town_map_wgs84,
    group = "Neighborhoods",
    fillOpacity = 0.1,
    color = "#444",
    weight = 1,
    opacity = 0.7,
    popup = ~~SP_ID
  ) %>%
  
  # Set the map to focus on the data
  fitBounds(
    ~as.numeric(st_bbox(geometry)[1]),
    ~as.numeric(st_bbox(geometry)[2]),
    ~as.numeric(st_bbox(geometry)[3]),
    ~as.numeric(st_bbox(geometry)[4])
  ) %>%

  # ==== Layer 1: 2013 vs 2012 ====
  addCircleMarkers(
    group = "2013 vs 2012",
    color = ~pal_yoy(YoY_2013),
    fillOpacity = 0.8,
    stroke = FALSE,
    radius = 5,
    popup = popup_2013,
    label = label_2013
  ) %>%
  
  # ==== Layer 2: 2014 vs 2013 ====
  addCircleMarkers(
    group = "2014 vs 2013",
    color = ~pal_yoy(YoY_2014),
    fillOpacity = 0.8,
    stroke = FALSE,
    radius = 5,
    popup = popup_2014,
    label = label_2014
  ) %>%

  # ==== Layer 3: 2015 vs 2014 ====
  addCircleMarkers(
    group = "2015 vs 2014",
    color = ~pal_yoy(YoY_2015),
    fillOpacity = 0.8,
    stroke = FALSE,
    radius = 5,
    popup = popup_2015,
    label = label_2015
  ) %>%
  
  # ==== Legend ====
  addLegend(
    pal = pal_yoy,
    values = cap_range,
    title = "YoY % Change",
    position = "bottomright",
    labFormat = labelFormat(suffix = "%", transform = function(x) x * 100),
    opacity = 1
  ) %>%

  # ==== Layer Control ====
  addLayersControl(
    baseGroups = c("2013 vs 2012", "2014 vs 2013", "2015 vs 2014"),
    overlayGroups = c("Light Map", "Satellite", "Neighborhoods"),
    options = layersControlOptions(collapsed = FALSE)
  )
## Warning in pal_yoy(YoY_2013): Some values were outside the color scale and will
## be treated as NA
## Warning in pal_yoy(YoY_2013): Some values were outside the color scale and will
## be treated as NA
## Warning in pal_yoy(YoY_2014): Some values were outside the color scale and will
## be treated as NA
## Warning in pal_yoy(YoY_2014): Some values were outside the color scale and will
## be treated as NA
## Warning in pal_yoy(YoY_2015): Some values were outside the color scale and will
## be treated as NA
## Warning in pal_yoy(YoY_2015): Some values were outside the color scale and will
## be treated as NA
# Display the map
m

It is not enough to only look at the absolute numbers for the outbreak of dengue fever. We must also look at the changes through time to identify whether there are any temporal trends. Health officals would want to measure key insights such as 50% decrease in a major hotspot or a 500% increase in a new previously safe area. This map visualizes the Year over Year YoY percentage change to provide these temporal insights. We represent these changes by using the color pallets of data points on the map with red points to signify a percentage increase in worsening outbreak while using blue points to show a percentage decrease which represents improving situation compared to the previous year. Finally, grey points indicate areas with no change or no cases in either period of the outbreak.

The first layer 2013 vs 2012 visualizes the success of containment efforts after the 2012 outbreak. These trends agree with our findings in section 3 of our analysis where we see a massive decrease in the amount of outbreaks which is also represented here with large quantities of blue points. The 2014 vs 2013 layer shows a more mixed but still generally positive story. Many areas that were blue in the previous layer are now grey indicating they went from some cases to zero or stayed at zero. The overall trend remains one of decline or stabilization which is as expected as the amount decrease is limited compared to the first outbreak year.

The 2015 vs 2014 layer though is showing a different story that starkly contrasts the previous trends. The map is now covered in bright red dots with many representing infinite or extremely high percentage increases as cases jumped from near 0 in 2014 to significant numbers in 2015. This visualization starkly contrasts with the absolute case map of Section 6. While that map showed where the 2015 hotspots were, this map shows the surprising amount of increase of dengue fever occurences representing another outbreak. This highlights the virus’s ability to resurge unexpectedly turning previously low risk zones into new hotspots quickly.

8. Conclusion and Future Questions

In our exploratory analysis, we successfully identified the spatial and temporal dynamics of Dengue fever in Mérida from 2012 to 2015. Our data revealed that a significant 2012 outbreak was followed by a two-year period of decline (2013-2014), allowing us to hypothesize that there was a successful containment. However, our findings in the Annual Case Trend Plot (Section 3) and the Animated Spatial Distribution Map (Section 5) indicate there was a powerful and abrupt resurgence in 2015.

Our geospatial analyses demonstrated that this 2015 outbreak was not a random event. The animated map revealed that the new hotspots were persistent and spatially clustered, emerging within or immediately adjacent to the original 2012 outbreak zones. The Year-over-Year (YoY) analysis (Section 7) supports this, showing a volatile spread into previously low-risk areas, highlighting the virus’s ability to resurge. These visualizations strongly support our primary conclusion that Dengue fever spreading in this urban environment is not random but a persistent, spatially driven phenomenon. The analysis of ovitrap data (Section 4) provides a preliminary, data-driven hypothesis for why these high-risk zones correlate with high vector (mosquito) density, suggesting they act as reservoirs for the virus even during “quiet” years.

This exploratory analysis can be further adapted for several specific, data-driven research focuses that move from what happened to why it happened. In future work, predictive modeling can be used to analyze how the 2012 hotspots (Section 5) correlate with the specific locations of the 2015 resurgence. Additionally, with increased access to ovitrap data, we can further analyze the ovitrap egg counts (Section 4) and the Dengue Fever case counts (Section 6) at the same neighborhood level. Having larger datasets with more available factors can also help identify if other variables contributed to the timing of the 2015 resurgence. Access to data on population, monthly precipitation, temperature, and drought regions can help identify specific triggers that activate these persistent, high-risk zones.

Overall, this study highlights the value of integrating temporal and spatial analyses to uncover patterns of disease persistence and resurgence in urban environments, while providing a foundation for data-driven public health and safety strategies.